190 research outputs found

    Building cumulonimbus cloud systems

    Get PDF

    Effect of slot type identification on frame length optimization

    Get PDF
    In dense radio frequency identification (RFID) systems, reducing reading times is crucial. For tag anti-collision management, most RFID systems rely on frame slotted ALOHA (FSA). The most common method to reduce the reading time for large tag populations is optimization of the number of slots per frame. The slot duration in real RFID systems is determined by the slot type (idle, successful, or colliding). Furthermore, by detecting the strongest transponder, colliding slots can be transformed to successful slots, a phenomenon known as the capture effect. Additionally, RFID readers might be capable of identifying slot types using the physical layer which reduces the colliding slot time because at this moment the reader can immediately terminate the connection as there is no need to reply with invalid acknowledge and wait for the time-out. In this paper, we provide a novel approach for analytical estimation of the optimal frame length. Our approach yields a novel closed form equation for the frame length that takes into account durations of different slot types, the capture effect, and the probability of slot type identification. Experimental results for FM0 encoding show that our technique achieves a total reading time reduction between 5.5% and 11.3% over methods that do not take into account slot type identification. However, the reduction in reading time is maximally 9%, 6%, and 1% for Miller encoding scheme with M = 2, 4, and 8, respectively

    Custom architecture for multicore audio Beamforming systems

    Get PDF
    The audio Beamforming (BF) technique utilizes microphone arrays to extract acoustic sources recorded in a noisy environment. In this article, we propose a new approach for rapid development of multicore BF systems. Research on literature reveals that the majority of such experimental and commercial audio systems are based on desktop PCs, due to their high-level programming support and potential of rapid system development. However, these approaches introduce performance bottlenecks, excessive power consumption, and increased overall cost. Systems based on DSPs require very low power, but their performance is still limited. Custom hardware solutions alleviate the aforementioned drawbacks, however, designers primarily focus on performance optimization without providing a high-level interface for system control and test. In order to address the aforementioned problems, we propose a custom platform-independent architecture for reconfigurable audio BF systems. To evaluate our proposal, we implement our architecture as a heterogeneous multicore reconfigurable processor and map it onto FPGAs. Our approach combines the software flexibility of General-Purpose Processors (GPPs) with the computational power of multicore platforms. In order to evaluate our system we compare it against a BF software application implemented to a low-power Atom 330, amiddle-ranged Core2 Duo, and a high-end Core i3. Experimental results suggest that our proposed solution can extract up to 16 audio sources in real time under a 16-microphone setup. In contrast, under the same setup, the Atom 330 cannot extract any audio sources in real time, while the Core2 Duo and the Core i3 can process in real time only up to 4 and 6 sources respectively. Furthermore, a Virtex4-based BF system consumes more than an order less energy compared to the aforementioned GPP-based approaches. © 2013 ACM

    Addressing GPU on-chip shared memory bank conflicts using elastic pipeline

    Get PDF
    One of the major problems with the GPU on-chip shared memory is bank conflicts. We analyze that the throughput of the GPU processor core is often constrained neither by the shared memory bandwidth, nor by the shared memory latency (as long as it stays constant), but is rather due to the varied latencies caused by memory bank conflicts. This results in conflicts at the writeback stage of the in-order pipeline and causes pipeline stalls, thus degrading system throughput. Based on this observation, we investigate and propose a novel Elastic Pipeline design that minimizes the negative impact of on-chip memory bank conflicts on system throughput, by decoupling bank conflicts from pipeline stalls. Simulation results show that our proposed Elastic Pipeline together with the co-designed bank-conflict aware warp scheduling reduces the pipeline stalls by up to 64.0 % (with 42.3 % on average) and improves the overall performance by up to 20.7 % (on average 13.3 %) for representative benchmarks, at trivial hardware overhead. \ua9 2012 The Author(s)

    Addressing GPU on-chip shared memory bank conflicts using elastic pipeline

    Get PDF
    One of the major problems with the GPU on-chip shared memory is bank conflicts. We analyze that the throughput of the GPU processor core is often constrained neither by the shared memory bandwidth, nor by the shared memory latency (as long as it stays constant), but is rather due to the varied latencies caused by memory bank conflicts. This results in conflicts at the writeback stage of the in-order pipeline and causes pipeline stalls, thus degrading system throughput. Based on this observation, we investigate and propose a novel Elastic Pipeline design that minimizes the negative impact of on-chip memory bank conflicts on system throughput, by decoupling bank conflicts from pipeline stalls. Simulation results show that our proposed Elastic Pipeline together with the co-designed bank-conflict aware warp scheduling reduces the pipeline stalls by up to 64.0 % (with 42.3 % on average) and improves the overall performance by up to 20.7 % (on average 13.3 %) for representative benchmarks, at trivial hardware overhead. \ua9 2012 The Author(s)

    A Taxonomy for Large-Scale Cyber Security Attacks

    Get PDF
    In an effort to examine the spread of large-scale cyber attacks, researchers have created various taxonomies. These taxonomies are purposefully built to facilitate the understanding and the comparison of these attacks, and hence counter their spread. Yet, existing taxonomies focus mainly on the technical aspects of the attacks, with little or no information about how to defend against them. As such, the aim of this work is to extend existing taxonomies by incorporating new features pertaining the defense strategy, scale, and others. We will compare the proposed taxonomy with existing state of the art taxonomies. We also present the analysis of 174 large cyber security attacks based on our taxonomy. Finally, we present a web tool that we developed to allow researchers to explore exiting data sets of attacks and contribute new ones. We are convinced that our work will allow researchers gain deeper insights into emerging attacks by facilitating their categorization, sharing and analysis, which results in boosting the defense efforts against cyber attack

    Dataflow Computing with Polymorphic Registers

    Get PDF
    Heterogeneous systems are becoming increasingly popular for data processing. They improve performance of simple kernels applied to large amounts of data. However, sequential data loads may have negative impact. Data parallel solutions such as Polymorphic Register Files (PRFs) can potentially accelerate applications by facilitating high speed, parallel access to performance-critical data. Furthermore, by PRF customization, specific data path features are exposed to the programmer in a very convenient way. PRFs allow additional control over the registers dimensions, and the number of elements which can be simultaneously accessed by computational units. This paper shows how PRFs can be integrated in dataflow computational platforms. In particular, starting from an annotated source code, we present a compiler-based methodology that automatically generates the customized PRFs and the enhanced computational kernels that efficiently exploit them

    The Case for Polymorphic Registers in Dataflow Computing

    Get PDF
    Heterogeneous systems are becoming increasingly popular, delivering high performance through hardware specialization. However, sequential data accesses may have a negative impact on performance. Data parallel solutions such as Polymorphic Register Files (PRFs) can potentially accelerate applications by facilitating high-speed, parallel access to performance-critical data. This article shows how PRFs can be integrated into dataflow computational platforms. Our semi-automatic, compiler-based methodology generates customized PRFs and modifies the computational kernels to efficiently exploit them. We use a separable 2D convolution case study to evaluate the impact of memory latency and bandwidth on performance compared to a state-of-the-art NVIDIA Tesla C2050 GPU. We improve the throughput up to 56.17X and show that the PRF-augmented system outperforms the GPU for 9×9 or larger mask sizes, even in bandwidth-constrained systems

    Compiler-Aided Methodology for Low Overhead On-line Testing

    Get PDF
    Reliability is emerging as an important design criterion in modern systems due to increasing transient fault rates. Hardware fault-tolerance techniques, commonly used to address this, introduce high design costs. As alternative, software Signature-Monitoring (SM) schemes based on compiler assertions are an efficient method for control-flow-error detection. Existing SM techniques do not consider application-specific-information causing unnecessary overheads. In this paper, compile-time Control-Flow-Graph (CFG) topology analysis is used to place best-suited assertions at optimal locations of the assembly code to reduce overheads. Our evaluation with representative workloads shows fault-coverage increase with overheads close to Assertion- based Control-Flow Correction (ACFC), the method with lowest overhead. Compared to ACFC, our technique improves (on average) fault coverage by 17%, performance overhead by 5% and power-consumption by 3% with equal code-size overhead

    Preliminary work on a mechanism for testing a customized architecture

    Get PDF
    Hardware customization for scientific applications has shown a big potential for reducing power consumption and increasing performance. In particular, the automatic generation of ISA extensions for General-Purpose Processors (GPPs) to accelerate domain-specific applications is an active field of research. Those domain-specific customized processors are mostly evaluated in simulation environments due to technical and programmability issues while using real hardware. There is no automatic mechanism to test ISA extensions in a real hardware environment. In this paper we present a toolchain that can automatically identify candidate parts of the code suitable for acceleration to test them in a reconfigurable hardware. We validate our toolchain using a bioinformatic application, ClustalW, obtaining an overall speed-up over 2x.Postprint (published version
    corecore